{
    "worksheets": [
        {
            "cells": [
                {
                    "source": "# Intrinsic structure of the Western-Arabic numerals scriptures\n\nUsing scikit-learn Spectral clustering.", 
                    "cell_type": "markdown"
                }, 
                {
                    "cell_type": "code", 
                    "language": "python", 
                    "outputs": [], 
                    "collapsed": true, 
                    "prompt_number": 22, 
                    "input": "import numpy as np\nimport pylab as pl"
                }, 
                {
                    "source": "## Load the digits dataset and plot the first elements", 
                    "cell_type": "markdown"
                }, 
                {
                    "source": "Small utility function to display a gallery of images:", 
                    "cell_type": "markdown"
                }, 
                {
                    "cell_type": "code", 
                    "language": "python", 
                    "outputs": [], 
                    "collapsed": true, 
                    "prompt_number": 23, 
                    "input": "def plot_images(images):\n    pl.gray()\n    pl.figure()\n    for i, img in enumerate(images[:25]):\n        pl.subplot(5, 5, i)\n        pl.imshow(img, interpolation=\"nearest\")\n        pl.xticks(())\n        pl.yticks(())\n    "
                }, 
                {
                    "source": "Lest load the digits dataset that comes with scikit learn (as a CSV file with gray level pixel values. Let's shuffle the dataset to make shure that the algorithm cannot exploit any ordering information. ", 
                    "cell_type": "markdown"
                }, 
                {
                    "cell_type": "code", 
                    "language": "python", 
                    "outputs": [], 
                    "collapsed": true, 
                    "prompt_number": 24, 
                    "input": "from sklearn import datasets\nfrom sklearn.utils import shuffle\n\ndigits = datasets.load_digits()\nimages, data, target = shuffle(\n    digits.images, digits.data, digits.target)\n\nplot_images(images)\n"
                }, 
                {
                    "source": "## Group the pictures in 10 groups using Spectral Clustering", 
                    "cell_type": "markdown"
                }, 
                {
                    "cell_type": "code", 
                    "language": "python", 
                    "outputs": [], 
                    "collapsed": true, 
                    "prompt_number": 25, 
                    "input": "from sklearn import cluster, neighbors\n\nn_clusters = 10\nS = neighbors.kneighbors_graph(data, 10)\nsc = cluster.SpectralClustering(n_clusters, mode='arpack', n_init=50)\nsc.fit(S)\nsc.labels_"
                }, 
                {
                    "cell_type": "code", 
                    "language": "python", 
                    "outputs": [], 
                    "collapsed": true, 
                    "prompt_number": 26, 
                    "input": "for i in range(n_clusters):\n    plot_images(images[sc.labels_ == i])"
                }, 
                {
                    "source": "\n## Profiling clustering algorithm\n\nThe following will runt the `cProfile` tool from the Python stdlib and display the output in a paged, tiled panel.", 
                    "cell_type": "markdown"
                }, 
                {
                    "cell_type": "code", 
                    "language": "python", 
                    "outputs": [], 
                    "collapsed": true, 
                    "prompt_number": 27, 
                    "input": "%prun cluster.SpectralClustering(10, mode='arpack').fit(S)"
                }, 
                {
                    "source": "## Supervised learning: learning to classify digits", 
                    "cell_type": "markdown"
                }, 
                {
                    "cell_type": "code", 
                    "language": "python", 
                    "outputs": [], 
                    "collapsed": true, 
                    "prompt_number": 28, 
                    "input": "from sklearn import svm, metrics\nX_train, y_train, X_test, y_test = data[:500], target[:500], data[500:], target[500:]\n\nclf = svm.SVC(gamma=0.001).fit(X_train, y_train)\n\nprint metrics.classification_report(y_test, clf.predict(X_test))"
                }, 
                {
                    "cell_type": "code", 
                    "language": "python", 
                    "outputs": [], 
                    "collapsed": true, 
                    "prompt_number": 29, 
                    "input": "cm = metrics.confusion_matrix(target[500:], clf.predict(data[500:]))\nprint cm"
                }, 
                {
                    "cell_type": "code", 
                    "language": "python", 
                    "outputs": [], 
                    "collapsed": true, 
                    "prompt_number": 30, 
                    "input": "pl.imshow(cm)"
                }
            ]
        }
    ], 
    "metadata": {
        "name": "structure_digits"
    }, 
    "nbformat": 2
}